众所周知,即使通过核心点之间捕获数据点之间的相似性,也可以通过捕获相似性来提供准确的预测和不确定性估计,以提供准确的预测和不确定性估计。然而,传统的GP内核在捕获高维数据点之间的相似性时不是非常有效的。神经网络可用于学习在高维数据中编码复杂结构的良好表示,并且可以用作GP内核的输入。然而,神经网络的巨大数据要求使得这种方法在小数据设置中无效。为了解决代表学习和数据效率的冲突问题,我们建议通过使用概率神经网络来学习概率嵌入的深核。我们的方法将高维数据映射到低维子空间中的概率分布,然后计算这些分布之间的内核以捕获相似性。要启用端到端学习,我们可以推导出用于培训模型的功能梯度血清过程。各种数据集的实验表明,我们的方法在监督和半监督设置中占GP内核学习中的最先进。我们还将我们的方法扩展到其他小型数据范例,例如少量分类,在迷你想象网和小熊数据集上以前的方式胜过先前的方法。
translated by 谷歌翻译
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.
translated by 谷歌翻译
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
translated by 谷歌翻译
End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.
translated by 谷歌翻译
Our education system comprises a series of curricula. For example, when we learn mathematics at school, we learn in order from addition, to multiplication, and later to integration. Delineating a curriculum for teaching either a human or a machine shares the underlying goal of maximizing the positive knowledge transfer from early to later tasks and minimizing forgetting of the early tasks. Here, we exhaustively surveyed the effect of curricula on existing continual learning algorithms in the class-incremental setting, where algorithms must learn classes one at a time from a continuous stream of data. We observed that across a breadth of possible class orders (curricula), curricula influence the retention of information and that this effect is not just a product of stochasticity. Further, as a primary effort toward automated curriculum design, we proposed a method capable of designing and ranking effective curricula based on inter-class feature similarities. We compared the predicted curricula against empirically determined effectual curricula and observed significant overlaps between the two. To support the study of a curriculum designer, we conducted a series of human psychophysics experiments and contributed a new Continual Learning benchmark in object recognition. We assessed the degree of agreement in effective curricula between humans and machines. Surprisingly, our curriculum designer successfully predicts an optimal set of curricula that is effective for human learning. There are many considerations in curriculum design, such as timely student feedback and learning with multiple modalities. Our study is the first attempt to set a standard framework for the community to tackle the problem of teaching humans and machines to learn to learn continuously.
translated by 谷歌翻译
准确的交通预测对于智能运输系统至关重要。尽管许多深度学习模型已经达到了最新的1小时交通预测,但长期交通预测跨越多小时仍然是一个重大挑战。此外,大多数现有的深度学习流量预测模型都是黑匣子,提出了与解释性和解释性有关的其他挑战。我们开发了图形金字塔自动构造(X-GPA),这是一种基于注意力的空间 - 速率图神经网络,使用了新型金字塔自相关注意机制。它可以从图表上的长时间序列中学习,并提高长期流量预测准确性。与几种最先进的方法相比,我们的模型可以实现高达35%的长期流量预测准确性。 X-GPA模型的基于注意力的分数提供了基于交通动态的空间和时间解释,这些解释会改变正常与高峰时段的流量以及工作日与周末流量的变化。
translated by 谷歌翻译
在本文中,我们通过神经生成编码的神经认知计算框架(NGC)提出了一种无反向传播的方法,以机器人控制(NGC),设计了一种完全由强大的预测性编码/处理电路构建的代理,体现计划的原则。具体而言,我们制作了一种自适应剂系统,我们称之为主动预测性编码(ACTPC),该系统可以平衡内部生成的认知信号(旨在鼓励智能探索)与内部生成的仪器信号(旨在鼓励寻求目标行为)最终学习如何使用现实的机器人模拟器(即超现实的机器人套件)来控制各种模拟机器人系统以及复杂的机器人臂,以解决块提升任务并可能选择问题。值得注意的是,我们的实验结果表明,我们提出的ACTPC代理在面对稀疏(外部)奖励信号方面表现良好,并且具有竞争力或竞争性或胜过几种强大的基于反向Prop的RL方法。
translated by 谷歌翻译
机器学习中的超参数优化通常是使用只会导致大约一组超参数的幼稚技术来实现的。尽管贝叶斯优化之类的技术在给定超参数的给定域进行了智能搜索,但不能保证最佳解决方案。大多数这些方法的一个主要缺点是用超参数数量增加其搜索域的指数增加,从而增加了计算成本并使方法缓慢。超参数优化问题本质上是双重优化任务,一些研究尝试了解决此问题的双重解决方案方法。但是,这些研究假设了一组独特的模型权重,可以最大程度地减少训练损失,这通常受到深度学习体系结构的影响。本文讨论了一种基于梯度的双层方法,该方法解决了这些缺点以解决超参数优化问题。所提出的方法可以处理我们在实验中选择正则化高参数的连续超参数。该方法保证了本研究已在理论上证明的一组最佳超参数的收敛。该想法基于使用高斯过程回归近似较低级别的最佳值函数。结果,使用增强拉格朗日方法解决的单个级别约束优化任务缩小为单个级别约束优化任务。我们已经对多层感知器和LENET架构进行了有关MNIST和CIFAR-10数据集的广泛计算研究,以证实该方法的效率。一项针对网格搜索,随机搜索,贝叶斯优化和Hyberband方法的比较研究表明,所提出的算法会收敛于较低的计算,并导致模型在测试集上更好地推广。
translated by 谷歌翻译
基于会话的建议系统在会话中捕获用户的短期兴趣。会话上下文(即,会话中用户在会话中的高级兴趣或意图)在大多数数据集中都没有明确给出,并且隐式推断会话上下文作为项目级属性的汇总是粗略的。在本文中,我们提出了ISCON,该ISCON隐含地将会议上下文化。ISCON首先通过创建会话信息图,学习图嵌入和聚类来为会话生成隐式上下文,以将会话分配给上下文。然后,ISCON训练会话上下文预测器,并使用预测上下文的嵌入来增强下一项目的预测准确性。四个数据集的实验表明,ISCON比最新模型具有优越的下一项目预测准确性。REDDIT数据集中的ISCON的案例研究证实,分配的会话上下文是独特而有意义的。
translated by 谷歌翻译
深度学习(DL)模型的功能可以通过模型提取被盗,其中攻击者通过利用原始模型的预测API来获得替代模型。在这项工作中,我们提出了一种称为Dynamarks的新型水印技术,以保护DL模型的知识产权(IP)免受黑箱设置中的模型提取攻击。与现有方法不同,Dynamarks不会改变原始模型的训练过程,而是通过基于推理运行时的某些秘密参数从原始模型预测API中动态更改输出响应来将水印嵌入替代模型中。时尚MNIST,CIFAR-10和Imagenet数据集的实验结果证明了Dynamarks方案对水印替代模型的功效,同时保留了部署在边缘设备中的原始模型的准确性。此外,我们还执行实验,以评估Dynamarks对各种水印策略的鲁棒性,从而使DL模型所有者可以可靠地证明模型所有权。
translated by 谷歌翻译